Document summarisation based on sentence ranking using vector space model
نویسندگان
چکیده
WWW is a repository of large collection of information available in the form of unstructured documents. It is a challenging task to select the documents of interest from such a huge document pool. To fasten the process of document retrieval, text summarization technique is used. Ranking of documents is made based on the summary or the abstract provided by the authors of the document. But it is not always possible as not all documents come with an abstract or summary. Also when different summarization tools are used to summarize the document, not all the topics covered within the document are reflected in its summary. In this chapter, a method to automate the process of text document summarization is proposed based on the term frequency within the document at different levels – paragraph and sentence. To summarize the document, similarity between the paragraphs and sentences within the paragraph is considered using Vector Space Model. Proposed system evaluation on the standard reference corpus from DUC-2002 using the ROUGE package indicates comparable avg. Recall, avg. Precision and avg. Fmeasure to existing summarization tools – Copernic, SweSum, Extractor, MSWord AutoSummarizer, Intelligent, Brevity, Pertinence taking DUC-2002 (100 words) human summary as baseline summary.
منابع مشابه
مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابهجایی وزندار
Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...
متن کاملCombining a mixture language model and Naive Bayes for multi-document summarisation
The TNO system for multi-document summarisation is based on an extraction approach. We combined two statistical methods for sentence selection with a variant of the MMR algorithm. After sentence segmentation, each sentence is scored on the basis of two probabilistic models. The first model scores sentences based on a (generative) unigram language model, which is a mixture of a cluster model, a ...
متن کاملModelling, Visualising and Summarising Documents with a Single Convolutional Neural Network
Capturing the compositional process which maps the meaning of words to that of documents is a central challenge for researchers in Natural Language Processing and Information Retrieval. We introduce a model that is able to represent the meaning of documents by embedding them in a low dimensional vector space, while preserving distinctions of word and sentence order crucial for capturing nuanced...
متن کاملMulti-Document Summarisation Using Generic Relation Extraction
Experiments are reported that investigate the effect of various source document representations on the accuracy of the sentence extraction phase of a multidocument summarisation task. A novel representation is introduced based on generic relation extraction (GRE), which aims to build systems for relation identification and characterisation that can be transferred across domains and tasks withou...
متن کاملOpinion-aware information management : statistical summarisation and knowledge representation of opinions
Nowadays, an increasing amount of media platforms provide the users with opportunities for sharing their opinions about products, companies or people. In order to support users accessing opinion-based information, and to support engineers building systems that require opinionaware reasoning, intelligent opinion-aware tools and techniques are needed. This thesis contributes methods and technolog...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJDMMM
دوره 5 شماره
صفحات -
تاریخ انتشار 2013